Privacy-preserving GWAS analysis on federated genomic datasets

نویسندگان

  • Scott D Constable
  • Yuzhe Tang
  • Shuang Wang
  • Xiaoqian Jiang
  • Steve Chapin
چکیده

BACKGROUND The biomedical community benefits from the increasing availability of genomic data to support meaningful scientific research, e.g., Genome-Wide Association Studies (GWAS). However, high quality GWAS usually requires a large amount of samples, which can grow beyond the capability of a single institution. Federated genomic data analysis holds the promise of enabling cross-institution collaboration for effective GWAS, but it raises concerns about patient privacy and medical information confidentiality (as data are being exchanged across institutional boundaries), which becomes an inhibiting factor for the practical use. METHODS We present a privacy-preserving GWAS framework on federated genomic datasets. Our method is to layer the GWAS computations on top of secure multi-party computation (MPC) systems. This approach allows two parties in a distributed system to mutually perform secure GWAS computations, but without exposing their private data outside. RESULTS We demonstrate our technique by implementing a framework for minor allele frequency counting and χ2 statistics calculation, one of typical computations used in GWAS. For efficient prototyping, we use a state-of-the-art MPC framework, i.e., Portable Circuit Format (PCF) 1. Our experimental results show promise in realizing both efficient and secure cross-institution GWAS computations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SAFETY: Secure gwAs in Federated Environment Through a hYbrid solution with Intel SGX and Homomorphic Encryption

Recent studies demonstrate that effective healthcare can benefit from using the human genomic information. For instance, analysis of tumor genomes has revealed 140 genes whose mutations contribute to cancer 1. As a result, many institutions are using statistical analysis of genomic data, which are mostly based on genome-wide association studies (GWAS). GWAS analyze genome sequence variations in...

متن کامل

Privacy Preserving Principal Component Analysis Clustering for Distributed Heterogeneous Gene Expression Datasets

In this paper, we present approaches to perform principal component analysis (PCA) clustering for distributed heterogeneous genomic datasets with privacy protection. The approaches allow data providers to collaborate together to identify gene profiles from a global viewpoint, and at the same time, protect the sensitive genomic data from possible privacy leaks. We then further develop a framewor...

متن کامل

A Privacy-Preserving GWAS Computation with Homomorphic Encryption

The continuous decline in the cost of DNA sequencing has contributed both positive and negative feelings in the academia and research community. It has now become possible to harvest large amounts of genetic data, which researches believe their study will help improve preventive and personalised healthcare, better understanding of diseases and response to treatments. However, there are more inf...

متن کامل

MetaSeq: Privacy Preserving Meta-Analysis of Sequencing-Based Association Studies

Human genetics recently transitioned from GWAS to studies based on NGS data. For GWAS, small effects dictated large sample sizes, typically made possible through meta-analysis by exchanging summary statistics across consortia. NGS studies groupwise-test for association of multiple potentially-causal alleles along each gene. They are subject to similar power constraints and therefore likely to r...

متن کامل

Scalable privacy-preserving data sharing methodology for genome-wide association studies: an application to iDASH healthcare privacy protection challenge

In response to the growing interest in genome-wide association study (GWAS) data privacy, the Integrating Data for Analysis, Anonymization and SHaring (iDASH) center organized the iDASH Healthcare Privacy Protection Challenge, with the aim of investigating the effectiveness of applying privacy-preserving methodologies to human genetic data. This paper is based on a submission to the iDASH Healt...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2015